ProjecTILs case study - MC38 TILs
In this case study, we will build an integrated scRNA-seq analysis workflow to interpret the transcriptional and clonal structure of tumor-infiltrating T cells in MC38 colon adenocarcinoma (data from Xiong et al 2019).
The main R packages and methods employed in this workflow are:
- Seurat - for storing, processing and visualizing scRNA-seq data
- ProjecTILs - for the projection of scRNA-seq data into a reference TIL atlas
- scRepertoire - for the analysis of TCR-seq data
Note that scRepertoire requires R version 4.0 or higher - you will need to have R v.4 installed to run this case study.
R Environment
Check & load R packages
Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS = "true")
library(renv)
renv::restore()
library(gridExtra)
library(ggplot2)
library(plotly)
library(ProjecTILs)
library(scRepertoire)scRNA-seq data preparation
Download scRNA-seq data from Array Express (https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-7919/) After download and unpacking (you will need curl and unzip), you should get three files: matrix.mtx, genes.tsv and barcodes.tsv
files <- c("E-MTAB-7919.processed.1.zip", "E-MTAB-7919.processed.2.zip", "E-MTAB-7919.processed.3.zip")
matrix_dir <- "./input/Xiong_TIL/matrix"
system(sprintf("mkdir -p %s", matrix_dir))
for (i in 1:length(files)) {
data_path <- sprintf("https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-7919/%s",
files[i])
system(sprintf("curl %s --output %s/data.zip", data_path, matrix_dir))
system(sprintf("unzip -o %s/data.zip -d %s", matrix_dir, matrix_dir))
}
system(sprintf("rm %s/data.zip", matrix_dir))Load scRNA-seq data and store as Seurat object
projectID <- "Xiong_TIL"
libIDtoSampleID <- c("Mouse 1", "Mouse 2", "Mouse 3", "Mouse 4")
names(libIDtoSampleID) <- 4:7
exp_mat <- Read10X(matrix_dir)
querydata <- CreateSeuratObject(counts = exp_mat, project = projectID, min.cells = 3,
min.features = 50)
querydata$Sample <- substring(colnames(querydata), 18)
table(querydata$Sample)
4 5 6 7
1759 1746 1291 1386
querydata$SampleLabel <- factor(querydata$Sample, levels = c(4:7), labels = libIDtoSampleID)
table(querydata$SampleLabel)
Mouse 1 Mouse 2 Mouse 3 Mouse 4
1759 1746 1291 1386
scTCR data preparation
Download scTCR data from Array Express (https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-7918/)
data_path <- "https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-7918/E-MTAB-7918.processed.1.zip"
tcr_dir <- "./input/Xiong_TIL/TCR"
system(sprintf("mkdir -p %s", tcr_dir))
system(sprintf("curl %s --output %s/data.zip", data_path, tcr_dir))
system(sprintf("unzip -o %s/data.zip -d %s", tcr_dir, tcr_dir))Mouse 1 to 4 (sample ID 4 to 7) correspond to TCR-seq libraries 35 to 38
libIDtoSampleID_VDJ <- 4:7
names(libIDtoSampleID_VDJ) <- 35:38
vdj.list <- list()
for (i in 1:length(libIDtoSampleID_VDJ)) {
s <- names(libIDtoSampleID_VDJ)[i]
vdj.list[[i]] <- read.csv(sprintf("%s/filtered_contig_annotations_%s.csv", tcr_dir,
s), as.is = T)
# Rename barcodes to match scRNA-seq suffixes
vdj.list[[i]]$barcode <- sub("\\d$", "", vdj.list[[i]]$barcode)
vdj.list[[i]]$barcode <- paste0(vdj.list[[i]]$barcode, libIDtoSampleID_VDJ[i])
vdj.list[[i]]$raw_clonotype_id <- paste0(vdj.list[[i]]$raw_clonotype_id, "-",
libIDtoSampleID_VDJ[i])
vdj.list[[i]]$SampleLabel <- libIDtoSampleID_VDJ[i]
}Combine alpha and beta chains using the combineTCR function from scRepertoire
# Using parameters removeNA=T and removeMulti=T will remove cells with multiple
# a-b combinations
combined <- combineTCR(vdj.list, samples = libIDtoSampleID_VDJ, ID = names(libIDtoSampleID_VDJ),
cells = "T-AB", removeNA = T, removeMulti = T)
for (i in seq_along(combined)) {
combined[[i]] <- stripBarcode(combined[[i]], column = 1, connector = "_", num_connects = 3)
}The function ‘combineExpression’ of scRepertoire allows incorporating clonotype information to a Seurat object, and creates the categorical variable ‘cloneTypes’ discretizing frequencies of the clonotypes
querydata <- combineExpression(combined, querydata, cloneCall = "gene", groupBy = "none")We have now paired expression and TCR data for the query samples, and loaded them into a unified Seurat object. We can proceed to project the data onto the reference atlas.
ProjecTILs
Load the reference atlas
ref <- load.reference.map()[1] "Loading Default Reference Atlas..."
[1] "/Users/mass/Documents/Projects/Github/ProjecTILs_CaseStudies/ref_TILAtlas_mouse_v1.rds"
[1] "Loaded Reference map ref_TILAtlas_mouse_v1"
Project query data (with clonotype information stored as metadata) onto the TIL reference atlas
query.projected <- make.projection(querydata, ref = ref, ncores = 2)[1] "Using assay RNA for query"
[1] "135 out of 6182 ( 2% ) non-pure cells removed. Use filter.cells=FALSE to avoid pre-filtering (NOT RECOMMENDED)"
[1] "Aligning query to reference map for batch-correction..."
Projecting corrected query onto Reference PCA space
Projecting corrected query onto Reference UMAP space
Visualization of projected data.
T cells are projected mostly in the Treg, CD8 Terminal exhausted (CD8_Tex) and Precursor exhausted (CD8_Tpex) areas, with some clusters in the T-helper (Th1) and CD8 Effector memory areas, and to a lesser extent in Naive-like and CD8 Early-Activated areas.
p1 <- plot.projection(ref)
p2 <- plot.projection(ref, query.projected, linesize = 0.5, pointsize = 0.5)
grid.arrange(p1, p2, ncol = 2)Visualize the projections per sample. Broadly, the distribution across the atlas is similar for the four mice, with some variation in the frequency of Effector Memory T cells.
plots <- list()
sample_names <- unique(query.projected$SampleLabel)
for (sample_i in seq_along(sample_names)) {
sample <- sample_names[sample_i]
plots[[sample_i]] <- plot.projection(ref, query.projected[, query.projected$SampleLabel ==
sample]) + ggtitle(sample)
}
grid.arrange(grobs = plots, ncol = 2)Classify projected T cells into cell subtypes/states
query.projected <- cellstate.predict(ref = ref, query = query.projected)
table(query.projected$functional.cluster) #Cell state assignment is stored in the 'functional.cluster' metadata field
CD4_NaiveLike CD8_EarlyActiv CD8_EffectorMemory CD8_NaiveLike
31 161 622 356
CD8_Tex CD8_Tpex Tfh Th1
1806 440 8 554
Treg
2069
Look at distribution of T cells in terms of cell states.
plot.statepred.composition(ref, query.projected, metric = "Percent")We can check the gene expression profile of the cells assigned to each state (yellow), and compare them to those of the reference states (black).
plot.states.radar(ref = ref, query.projected, min.cells = 30)For example, cells projected into Tex and Tpex region express Tox and Pdcd1, while only Tex express Gzmb and Havcr2, as expected.
Clonality analysis
Now, let’s see where the expanded clones are located within the map:
levs <- c("Hyperexpanded (100 < X <= 500)", "Large (20 < X <= 100)", "Medium (5 < X <= 20)",
"Small (1 < X <= 5)", "Single (0 < X <= 1)", NA)
palette <- colorRampPalette(c("#FF4B20", "#FFB433", "#C6FDEC", "#7AC5FF", "#0348A6"))
query.projected$cloneType <- factor(query.projected$cloneType, levels = levs)
DimPlot(query.projected, group.by = "cloneType") + scale_color_manual(values = c(palette(5)),
na.value = "grey")Larger (most highly expanded) clones are concentrated in the Tex/Tpex area. This is largely explained by the fact that sustained antigenic stimulation drives immunodominant clones into the Tox+-driven exhausted lineage (Tex - Tpex region)
Let’s check the raw numbers
table(query.projected@meta.data$cloneType, query.projected@meta.data$functional.cluster)
CD4_NaiveLike CD8_EarlyActiv
Hyperexpanded (100 < X <= 500) 0 0
Large (20 < X <= 100) 0 1
Medium (5 < X <= 20) 0 2
Small (1 < X <= 5) 0 6
Single (0 < X <= 1) 23 64
CD8_EffectorMemory CD8_NaiveLike CD8_Tex
Hyperexpanded (100 < X <= 500) 1 0 86
Large (20 < X <= 100) 99 0 413
Medium (5 < X <= 20) 46 1 318
Small (1 < X <= 5) 98 10 222
Single (0 < X <= 1) 140 162 143
CD8_Tpex Tfh Th1 Treg
Hyperexpanded (100 < X <= 500) 31 0 0 3
Large (20 < X <= 100) 97 0 0 8
Medium (5 < X <= 20) 69 0 26 138
Small (1 < X <= 5) 56 2 118 242
Single (0 < X <= 1) 32 1 172 307
Plot clonal size by functional cluster
meta <- melt(table(query.projected@meta.data[!is.na(query.projected@meta.data$Frequency),
c("functional.cluster", "cloneType")]), varnames = c("functional.cluster", "cloneType"))
ggplot(data = meta, aes(x = functional.cluster, y = value, fill = cloneType)) + geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) + scale_fill_manual(values = c(palette(5)),
na.value = "grey")This confirms the intial observation that Tex and Tpex present the largest clonal expansion
We can highlight specific clonotypes on the reference atlas (here those with at least 40 cells):
clone_call = "CTaa" #select column/variable to use as clonotypes ID, in this case CTaa, the paired TCR CDR3 aminoacid sequences
cutoff <- 35 #Min cells for clonotype
clonotypeSizes <- sort(table(query.projected[[clone_call]])[table(query.projected[[clone_call]]) >
cutoff], decreasing = T)
bigClonotypes <- names(clonotypeSizes)
plots <- list()
for (i in 1:length(bigClonotypes)) {
ctype <- bigClonotypes[i]
plots[[i]] <- plot.projection(ref, query.projected[, which(query.projected[[clone_call]] ==
ctype)]) + ggtitle(sprintf("%s - size %i", ctype, clonotypeSizes[ctype]))
}
grid.arrange(grobs = plots, ncol = 2)The majority of clones tend to span Tex and Tpex states. Indeed, Tcf7+ precursor exhausted Tpex cells self-renew and give rise to more differentiated Tex effector cells.
The clonal overlap (Morisita similarity index) implemented in scRepertoire confirms this pattern:
meta.list <- expression2List(query.projected, group = "functional.cluster")
clonalOverlap(meta.list, cloneCall = "gene", method = "morisita") + theme(axis.text.x = element_text(angle = 90,
hjust = 1, vjust = 0.5))We can also visualize overlap of the top CD8 clones, and their overlap across functional clusters, using scRepertoire’s alluvial plot
compareClonotypes(meta.list, numbers = 6, samples = c("CD8_EffectorMemory", "CD8_Tpex",
"CD8_Tex", "CD8_NaiveLike", "CD8_EarlyActiv"), cloneCall = "aa", graph = "alluvial") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))Conclusions
Projection of the MC38 TILs onto the reference atlas showed that these T cell samples mostly consist of exhausted (CD8_Tex), precursor-exhausted CD8 T cells (CD8_Tpex), and Tregs, typical of highly immunogenic, “hot” tumors. Also, the majority of expanded clones were found spanning the CD8_Tex and CD8_Tpex states. This is expected as sustained antigen stimulation in the tumor drives immuno-dominant tumor-reactive clones towards the exhaustion differentiation path.
The combination of ProjecTILs and scRepertoire simplifies the joint analysis of single-cell expression data and clonotype analysis, in the context of an annotated reference atlas of TIL states.
Further reading
Original publication - Xiong et al. (2019) Cancer Immunol Res
ProjecTILs case studies - INDEX - Repository
The ProjecTILs method Andreatta et. al (2021) Nat. Comm. and code